Learning Options for an MDP from Demonstrations

نویسندگان

  • Marco Tamassia
  • Fabio Zambetta
  • William L. Raffe
  • Xiaodong Li
چکیده

The options framework provides a foundation to use hierarchical actions in reinforcement learning. An agent using options, along with primitive actions, at any point in time can decide to perform a macro-action made out of many primitive actions rather than a primitive action. Such macro-actions can be hand-crafted or learned. There has been previous work on learning them by exploring the environment. Here we take a different perspective and present an approach to learn options from a set of experts demonstrations. Empirical results are also presented in a similar setting to the one used in other works in this area.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Boosted Bellman Residual Minimization Handling Expert Demonstrations

This paper addresses the problem of batch Reinforcement Learning with Expert Demonstrations (RLED). In RLED, the goal is to find an optimal policy of a Markov Decision Process (MDP), using a data set of fixed sampled transitions of the MDP as well as a data set of fixed expert demonstrations. This is slightly different from the batch Reinforcement Learning (RL) framework where only fixed sample...

متن کامل

Boosted and reward-regularized classification for apprenticeship learning

This paper deals with the problem of learning from demonstrations, where an agent called the apprentice tries to learn a behavior from demonstrations of another agent called the expert. To address this problem, we place ourselves into the Markov Decision Process (MDP) framework, which is well suited for sequential decision making problems. A way to tackle this problem is to reduce it to classif...

متن کامل

Reusing Old Policies to Accelerate Learning onNew

We consider the reuse of policies for previous MDPs in learning on a new MDP, under the assumption that the vector of parameters of each MDP is drawn from a xed probability distribution. We use the options framework, in which an option consists of a set of initiation states, a policy, and a termination condition. We use an option called a reuse option, for which the set of initiation states is ...

متن کامل

Inverse Reinforce Learning with Nonparametric Behavior Clustering

Inverse Reinforcement Learning (IRL) is the task of learning a single reward function given a Markov Decision Process (MDP) without defining the reward function, and a set of demonstrations generated by humans/experts. However, in practice, it may be unreasonable to assume that human behaviors can be explained by one reward function since they may be inherently inconsistent. Also, demonstration...

متن کامل

Imitation Learning with a Value-Based Prior

The goal of imitation learning is for an apprentice to learn how to behave in a stochastic environment by observing a mentor demonstrating the correct behavior. Accurate prior knowledge about the correct behavior can reduce the need for demonstrations from the mentor. We present a novel approach to encoding prior knowledge about the correct behavior, where we assume that this prior knowledge ta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015